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' We use a three dimensional lattice model of proteins to investigate 

i ! 

systematically the global properties of the polypeptide chains that determine 



the folding to the native conformation starting from an ensemble of dena- 



& ' tured conformations. In the coarse grained description, the polypeptide chain 

a" 

is modeled as a heteropolymer consisting of N beads confined to the vertices 



of a simple cubic lattice. The interactions between the beads are taken from 
a random Gaussian distribution of energies, with a mean value Bq, that cor- 



^ ■ responds to the overall average hydrophobic interaction energy. We studied 



56 sequences all with a unique ground state (native conformation) covering 
two values of N (15 and 27) and two values of Bq. The smaller value of Bq 
was chosen so that the average fraction of hydrophobic residues corresponds 
to that found in natural proteins. The higher value of Bq was selected with 
the expectation that only the fully compact conformations would contribute 
to the thermodynamic behavior. For N = 15 the entire conformation space 
(compact as well as noncompact structures) can be exhaustively enumerated 
so that the thermodynamic properties can be exactly computed at all temper- 
atures. The thermodynamic properties for the 27-mer chain were calculated 
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using slow cooling technique together with standard Monte Carlo simulations. 
The kinetics of approach to the native state for all the sequences was obtained 
using Monte Carlo simulations. For all sequences we find that there are two 
intrinsic characteristic temperatures, namely, Tg and Tf. At the temperature 
Tg the polypeptide chain makes a transition to a collapsed structure while 
at Tf the chain undergoes a transition to the native conformation. We show 
that foldability of sequences can be characterized entirely in terms of these 
two temperatures. It is shown that fast folding sequences have small values of 
a = (Tg — Tf)/Tg, whereas slow folders have larger values of a (the range of a 
is < a < 1). The calculated values of the folding times correlate extremely 
well with a. An increase in a from 0.1 to 0.7 can result in an increase of 5-6 
orders of magnitudes in folding times. In contrast, we demonstrate that there 
is no useful correlation between folding times and the energy gap between 
the native conformation and the first excited state at any N for any value 
of Bq. In particular, in the parameter space of the model, many sequences 
with varying energy gaps all with roughly the same folding time can be eas- 
ily engineered. Folding sequences in this model, therefore, can be classified 
based solely on the value of a. Fast folders have small values of a (typically 
less than about 0.1), moderate folders have values of a in the approximate 
range between 0.1 and 0.6, while for slow folders a exceeds 0.6. The precise 
boundary between these categories depends crucially on N. The range of a 
for fast folders decreases with the length of the chain. At temperatures close 
to Tf fast folders reach the native conformation via a topology inducing nu- 
cleation collapse mechanism without forming any detectable intermediates, 
whereas only a fraction of molecules <3?(T) reaches the native conformation by 
this process for moderate folders. The remaining fraction reaches the native 
state via three stage multipathway process. For slow folders <&(T) is close to 
zero at all temperatures. The simultaneous requirement of native state sta- 
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bility and kinetic accessibility can be achieved at high enough temperatures 
for those sequences with small values of a. The utility of these results for de 
novo design of proteins is briefly discussed. 
Keywords: protein folding; lattice Monte Carlo simulations; kinetic accessibility and 
stability; kinetic partitioning mechanism; topology inducing nucleation collapse. 



I. INTRODUCTION 

The mechanisms, by which proteins adopt well-defined three dimensional topological 
structures, have been extensively investigated theoretically fl|-|6[ as well as experimentally 
|?]||. The major intellectual impetus for these studies originate in the so called Levinthal 
paradox ||. Since the number of conformations of even relatively short proteins is astro- 
nomically large Levinthal suggested that it would be impossible for proteins to reach the 
native conformation by a random search through all the available conformation space. In the 
last few years several groups |1]-§J have provided, largely complimentary, possible theoretical 
resolutions to the Levinthal paradox. The unifying theme that emerges from these studies, 
all of which are based on certain minimal model representations of proteins [|T0| — pT6|,|T8| — 127|1 
is that due to certain intrinsic preference for native structures proteins efficiently explore 
the underlying rough energy landscape. Explicit computations of the energy landscape for 
certain lattice models reveal that foldable sequences (those that reach the native con- 
formations on a relatively fast time scales, which for real proteins is typically of the order 



of a second) have relatively small free energy barriers. These and related studies |T^ , ^9|j29 
have emphasized the importance of the connectivity between various low energy states in 
determining the kinetic foldability of proteins. Thus, it appears that in order to fully elu- 
cidate the folding kinetics of proteins it is necessary to understand not only the low energy 
spectrum, but also how the various states are connected. 

It is of interest to wonder if natural proteins have been designed so that the requirements 
of kinetic foldability and stability have been simultaneously satisfied and if so how are they 
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encoded in the primary sequence. If this is the case, then it follows that because protein 
folding is a self-assembly process the kinetic foldability of proteins should be described in 
terms of the properties of the sequence itself. This argument would indicate that certain 
intrinsic thermodynamic characteristics associated with the sequence may well determine the 
overall kinetic accessibility of the native conformation. The minimal models are particularly 
suitable for addressing this question. For these models the folding kinetic rates for every 
sequence can be precisely calculated for small enough values of N - the number of beads in 
the model polypeptide chain. In addition the energy spectrum for small values of N can be 
explicitly enumerated for lattice models and for moderate sized proteins it can be computed 
by simulation methods. Thus, these models afford a systematic investigation of the factors 
governing folding rates. 

It is perhaps useful right at the outset to say a few words about the lattice representation 
of polypeptide chains. The energies employed in the minimal models (or in other knowledge 
based schemes) should be thought of as estimates of potentials of mean force after other 
irrelevant degrees of freedom are integrated out. In principle, this leaves one with an effective 
potential surface involving only the protein coordinates. In the minimal models one further 
coarse grains this force field by eliminating all coordinates except, perhaps, those associated 
with centers of the residues. The lattice models further confine these centers to the vertices 
of a chosen lattice. These arguments show that we can at best expect only qualitative 
themes to emerge from these studies. Nevertheless, these simulations together with other 
theoretical ideas have provided testable predictions for the kinetics of refolding of proteins. 

By using the random energy model (REM), originally introduced as the simplest mean 
field spin glass model |3(J, as a caricature of proteins it has been proposed that the dual 



requirement (kinetic accessibility of the native conformation as well as the stability) can 
be satisfied if the ratio of the folding transition temperature Tf to an equilibrium glass 
transition temperature T gm , at which the entropy vanishes in REM, is maximized [fH,ffT||. (It 
has been noted that in order to use this criterion in lattice models T gm has to be replaced 
by a kinetic glass transition temperature T g ^ in HH- It would be desirable to clarify the 



relationship between T gm and T 9t ki n )- 

Based partially on lattice simulations of proteins a plausible relationship between folding 
rates and the ratio of Tf to the collapse transition temperature Tg was conjectured a couple 
of years ago. In particular, Camacho and Thirumalai have argued that the fast folding 
sequences have small values of a = (Tg — Tf)/Tg [p5| . The theoretical reason for such an 
expectation has been given recently fl34| . The advantage of the criterion, based on the 
smallness of a to classify fast and slow folding sequences, is that both Tg and Tf are readily 
calculable from equilibrium properties. More importantly, one can deduce Tg and Tf directly 
from experiments. 



More recently, Sali et al. pO| , |2T| have forcefully asserted that for the class of minimal 
models of the sort described here the necessary and sufficient condition for folding (within a 
preset time scale in Monte Carlo simulations) is that the native conformation be separated 
from the first excited state by a large gap (which is presumably measured in the units of ksT 
with ks being the Boltzmann constant and T the temperature). However, their studies are 
incomplete and rest on untested assumptions. They restricted their conformation space to 
only a search among all compact structures of a 27-bead heteropolymer. More importantly, 
they did not provide for their model the dependence of the folding times as a function of the 
gap to establish the kinetic foldability of any sequence. Thus there is no direct evidence of 
the dependence of the folding times for various sequences and the gap A = Ei — E - which 
in the original study has been stated as a mathematical theorem. Notice that this gap has 
been defined as the difference between the two lowest energy levels assuming that both these 
correspond to compact conformations. It is, in fact, relatively straightforward to provide 



counter examples to this criterion casting serious doubts on the general validity of the 
strict relationship between gap and folding times. Moreover, it is extremely difficult, if not 
impossible, to obtain the value of the gap for proteins either experimentally or theoretically. 
Thus, the practical utility of this criterion for models other than lattice systems is limited 
at best. 

The major purpose of this study is to critically examine the various properties of se- 



quences (all of which have a unique ground state) that determine the kinetic accessibility 
and stability of lattice representations of proteins. This is done by calculating the folding 
rates and thermodynamic properties using N — 15 for a number of sequences. The qual- 
itative lesson from this case is verified by studying a smaller number of sequences for the 
much studied case of N = 27. The rest of the paper is organized as follows: In Sec. (II) the 
complete details of the model as well as simulation methods are discussed. The results of 
this study for a variety of cases are given in Sec. (III). The paper is concluded in Sec. (IV) 
with a discussion. 

II. DESCRIPTION OF THE MODEL AND SIMULATION TECHNIQUES 

A. Model 

We model proteins as chains of N successively connected beads (residues) located at the 
sites of an infinite cubic lattice (Figs. (1,2)). To satisfy self-avoiding condition we impose 
the restriction that each lattice site can be occupied only once (or remain free). The length 
of the bond between two residues is fixed and is equal to the lattice spacing a = 1. Any 
conformation (structure, in lattice terms), which the protein can adopt, is described by 
{i = 1,..,N) vectors with discrete coordinates x^y^Zi = 0,1,2..., which are the positions 
of residues on the lattice. We assume that the only interactions, contributing to the total 
energy of a protein structure, are those that arise due to the interactions between residues 
that are far apart along a chain. We further assume that the interactions are short ranged 
and can be represented by topological contacts between residues. A topological contact is 
formed when two nonbonded beads % and j (| i — j |> 3) are nearest-neighbors on the cubic 
lattice, i.e., | rj — ij |= a. Thus, the total energy of a protein E is given by the sum of the 
energies assigned to topological contacts found in a structure, i.e 

E= £ BySiry-a), (1) 

i<j+3 
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where Bij is the interaction energy between iih and j'th residues, which form a topological 
contact, Tij =| Ti — Tj | is the distance between them, and 5(0) = 1 and otherwise. In order 
to take into account the heterogeneity of interactions found in real proteins we assume that 
the interaction energies have a Gaussian distribution of the form [f20|-|23 



P(Bi 



exp 



Bin — Bn 



(2) 



(2tt j B 2 ) 1 /2 2B 2 

where Bo is the mean value and B is the standard deviation. This model that has been 
extensively investigated theoretically |]20|^l| , |23|j35| . It should be stressed that the models 
studied here and similar lattice models are at best caricatures of real proteins fl36| . The only 
objective of these studies should be to obtain qualitative behavior which hopefully shed light 
on the experiments. This, of course, requires extrapolating from these model systems to the 
behavior expected in proteins in terms of experimentally variable parameters. A tentative 
proposal for achieving this has recently been given p4| . 



B. Choice of Bo 

Since the actual energy scales are not known, we set B in Eq. (2) equal to 1 and 
thus all energies are expressed in the units of B. In contrast, we will demonstrate that 
the precise value of B Q (or more precisely the ratio B /B) plays a crucial role. Negative 
values of Bq favor random collapse of the chain as the temperature is lowered. In addition, 
the mean value Bq controls the nature of conformations that constitute the low energy 
part of the spectrum. The extensive full enumeration study of the conformational space 
for different sequences of various lengths N indicates that as the mean value B decreases 
structures with maximum number of topological contacts (compact structures, CS) start to 
dominate among conformations with minimum energies pl|j37| . Furthermore, a relationship 
can be obtained between the value of Bo and the ratio of hydrophilic and hydrophobic 
residues in a sequence. This would be relatively straightforward for random site models of 
the sort introduced recently [Q. For the random bond case, that is the subject of this 
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and numerous previous studies, the computation of the fraction of hydrophobic residues 
is somewhat ambiguous. Nevertheless, the following procedure can be used to calculate 
their fraction. Since it is natural to identify negative interaction energies as those between 
hydrophobic residues and positive ones to be between hydrophilic residues, one can specify 
boundary energies Bh and Bp (Bh = —Bp) in such a way that the energies B^ below Bh 
corresponds to hydrophobic interactions and the energies above Bp pertain to hydrophilic 
interactions. The energies Bij, lying between these boundaries, are associated with mixed 
interactions. If the number of hydrophobic and hydrophilic residues in a sequences is Nh 
and Np, respectively, (Nh + Np = N), the fraction of hydrophobic energies Xh among B^ 
is roughly (Nh/N) 2 . This fraction can be also obtained by integrating the distribution (2) 
from infinity to the energy Bh- The relationship between Xh and B may be obtained as 

\ H ~ (N H /N) 2 = / P(B lJ )dB iy (3) 

J — oo 

The precise value of Bh (and Bp) can easily be determined if one considers the case with 
B = 0, for which Nh = N P . Using Eqs. (2) and (3) we find that B H = -0.675 (for this 
particular value of B one quarter of all energies B^ are below the boundary energy Bh and 
one quarter - above Bp). It is known that in natural proteins hydrophobic residues make up 
approximately 54 percent of all residues in a sequence |39]]. For the distribution in Eq. (2) 



this implies that the mean value B should be approximately —0.1. Most of our simulations 
have been performed with this value of B . 

We have also performed a study of the sequences with B = —2.0 that in the language 
of sequence composition means that hydrophobic residues constitute about 94 percent of 
all residues. The motivation for choosing this value of Bq is the following. For Bq = —2.0 



the low energy spectrum becomes more sparse [21], because as mentioned above the main 
contribution comes from CS, whose total number is considerably less than the number of 
conformations with any other number of topological contacts c. Specifically, for iV = 15 the 
number of conformations having c = 11 is 3,848, c = 10 - 17,040, c = 9 - 97,216, c = 8 
- 313,868 etc. Studying the folding rates of the sequences having different B enables us 
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to assess the role of the available conformation space and the connectivity between various 
states in determining the kinetics of the folding process. The choice of B = —2.0 also allows 
us to compare directly our results to previous studies found in the literature pO|,|2T . 



C. Choice of N 

In order to fully characterize the folding scenarios it is necessary to understand the 
kinetics of approach to a native conformation in these models as a function of N and tem- 
perature. It has been shown recently that the folding of real proteins depends critically on 
N, the characteristic temperatures of the polypeptide chain (Tg,Tf, and perhaps the kinetic 
glass transition temperature T g ), viscosity, surface tension 7 etc. p^| . Thus in order to 



make the results of the minimal models relevant to proteins it is imperative to vary N in 
the simulations. 

Although one would like to understand the kinetic behavior of foldable heteropolymers 
for sufficiently large N this is currently computationally difficult. In the present study 
we have chosen N = 15 and N = 27. We chose N = 15 because for this value one can 
perform detailed kinetic study by including all conformations (compact and noncompact). 
A detailed study for three dimensional (3D) lattice models comparable to that undertaken 
for two dimensional (2D) systems has never been done H JT2]JT3|j40|1 . With this value of N one 
of the limitations of the study of Sali et al, who restricted themselves to compact structures 
only, can be overcome. From any theoretical perspective the qualitative difference in results 
between N = 15 and N = 27 should be insignificant. This is certainly the experience in 



simulations of polymeric systems J4(J. Thus, we expect that the qualitative aspects of the 
kinetics of folding should be quite similar for N = 15 and 27. This is, in fact, the case. 

One might naively think that for N = 15 the total number of conformations is not enough 
for folding times to exceed the Levinthal time, which is roughly the number of conformations 
of the polypeptide chain. The basis for this argument is that the conformation space of 
iV = 15 is considerably less than for N = 27. The total number of conformations of the 
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chain of N residues Cat or equivalently the number of all possible self- avoiding walks of N — 1 
steps on a cubic lattice is JIT] 

C N « a(N - ly-'Z^ (4) 

where Z e g = 4.684, the universal exponent 7 ~ 1.16, and a = o(l). For AT = 15 and 27 CV is 
approximately 7.77 x 10 s and 4.6 x 10 17 , respectively. Full enumeration (FE) of SAW which 
are unrelatedby symmetry for AT = 15 gives C[ 5 E = 93, 250, 730, which differs approximately 
from the number of all SAW by a factor of 48. Thus, C15 obtained from Eq. (4) and C[ 5 E 
are consistent. The chain of 15 residues can form 3,848 CS, which belong to 3x3x2 or 4x2x2 



tetragonals, whereas 27-mer chain adopts 103,346 CS with 28 topological contacts |35| . [lT 
all these CS are confined to 3x3x3 cube. 

From the above enumerations of the conformations it might be tempting to speculate that 
virtually all sequences for AT = 15 with a unique ground state should fold on the Levinthal 
time scale of 9.3 x 10 7 Monte Carlo steps (MCS). However, we find that for some of the thirty 
two sequences examined the maximum folding time can be larger than 10 9 MCS depending 
upon several characteristics (see below). Thus, even if the chain samples one conformation 
per MCS the bottlenecks in the energy surface can prevent the chain from reaching the 
native conformation. This implies that the number of conformations alone cannot determine 



folding times 2S . In fact, in the case of the models of disulfide bonded proteins it has been 



explicitly demonstrated that a significant reduction of available conformation space does not 



guarantee a decrease in folding times . Thus, kinetic foldability is determined by several 
factors and hence explicit studies of AT = 15, where full enumeration of all conformations 
is possible, should help us gain insights into the folding of small proteins. A comparison of 
the results for small values of A" is also useful in assessing finite size effects. 

The arguments given above together with explicit computations given here reject claims 



| 42|| that the study of short chains (N < 27 in three or two dimensions) are not of signifi- 
cance in illustrating the qualitative behavior of protein folding kinetics. Numerous studies 
have shown that it is not merely the size of the conformation space, but the connectivity 
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between conformations, i.e. the nature of the underlying energy landscape that allows one 
to distinguish between foldable and non-foldable sequences ||29|| . The obvious advantage of 
N = 15 is that systematic thermodynamic and kinetic studies (already performed for 2D 
chains of N up to 30) can be undertaken for 3D chains. 



D. Correlation Functions 

For probing the thermodynamics and kinetics of protein folding we use the overlap func- 
tion (considered here as an order parameter), which is defined as 



X=1 " ^-3Jy + 2 .jg./ (r ^" r ^ ) ' (5) 



where rfi refers to the coordinates of the native state. This function measures structural 
similarity between the native state and the state of interest: the smaller the value of x 
becomes the larger a given structure resembles the native one. Additional structural and 
kinetic information can be obtained using the function Q, which counts the relative number 
of native-like topological contacts in a structure ]TTJ, 



Q = £t> (6) 



r tot 

u 1l 



where c n is the number of native-like contacts in a given structure and c^°* is the total number 
of contacts found in the native structure. 

We have calculated the relevant thermodynamic properties such as the total energy 

< E >, the specific heat C v with 

C. = < E2 >~ T < E >\ (7) 
the function < Q >, and the Boltzmann probability of being in the native state 

P {Eo) = ^^>, (8 ) 

where Z = J2i Gxp(-PEi) and (3 = l/iksT) (k,B is set to 1 in our simulations). The brackets 

< ... > indicate the thermodynamic averages. In addition, the overlap function and the 
fluctuations in < x >, namely 
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Ax =< x 2 > ~ < X > 2 (9) 

were also calculated. The thermodynamic characteristics of the system can be exactly 
calculated for each sequence by exhaustively enumerating the various symmetry unrelated 
conformations for small enough values of N. In particular, we calculated these quantities 
exactly for N = 15. For N = 27 we used slow cooling Monte Carlo method to calculate 
the appropriate quantities of interest. The annealing simulation procedure is discussed in 
Appendix A. 

The parameter that distinguishes fast folding and slow folding sequences appears to be 
a = (Tg — Tf)/Tg, where Tg is the collapse transition temperature and Tf is the folding 
transition temperature [p3jp4| . It is known that even in these finite sized systems Tg can 



be estimated using the peak in the temperature dependence of C v (cf. Eq. (7)) p5| , |33| , ^3 
We have shown in previous studies involving both lattice and off-lattice models that the 
temperature dependence of the fluctuations in the overlap function, which serves as an 
order parameter, can be used to calculate Tf |£|[44[]. In particular, Tf corresponds to the 
peak in the function A%. 

For most sequences Tg and Tf are sufficiently well separated that an unambiguous deter- 
mination is possible by a straightforward computation of the temperature dependence of C v 
and A%. However, we have generated five sequences (out of 32 for N = 15, B = —0.1), for 
which C v or A% appear not to have well-defined single maximum due to specific arrange- 
ment of the energy states. For example, sequence 32 shows two maxima in the dependence 
C V (T) at Ti = 0.28 and T 2 = 0.73 of different amplitudes C V {T X ) = 8.25 and C V {T 2 ) = 13.22, 
respectively. In this case, we defined Tg as a weighted average over the temperatures T\ and 
T 2 

_ CyjTjT, + C V (T 2 )T 2 
CUT!) + C V (T 2 ) • 1 ' 

In other instances (e.g., for sequence 10), we found that although the dependence C V (T) 

has a single maximum at T max , it also has the interval (T',T") not including T max , wherein 

the derivative '^k- again approaches almost zero value that gives essentially unsymmetric 
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form to the peak of specific heat. We have also applied Eq. (10) for calculating Tg for such 
sequences by setting T\ = T max and T 2 ^ T max corresponds to the temperature, at which 
\^-\ has the smallest value within (T',T"). 

There are other ways of calculating Tg and Tf. For example, Tg could be directly inferred 
from the temperature dependence of the radius of gyration of the polypeptide chain. It has 
been shown in our earlier work on off-lattice models that the resulting values of Tg 
coincide with those obtained from the peak in the specific heat. The folding transition 
temperature is often associated with the midpoint of the temperature dependence of the 
probability of being in the native conformation. This estimate of Tf is in good agreement 
with the peak position of the temperature dependence of A%. In general, different order 
parameters can be used to calculate Tf. The resulting values are fairly consistent with each 
other. 



E. Sequence Design 

To create a database of different sequences for N = 15 we generated 60 random sequences, 
using the mean value B = —0.1 and 9 random sequences, using B = —2.0. For N = 
27 we generated 15 random sequences with B Q = —0.1 and 2 with B = —2.0. Note 
that the computational procedures for 15-mer and 27-mer sequences are similar, except 
that thermodynamic quantities for N = 27 are calculated from slow cooling Monte Carlo 
simulations. By enumerating all possible conformations for N = 15 we determined the 
energy spectrum for each sequence. The program for enumerating all the structures a protein 



can adopt on a cubic lattice is based on the Martin algorithm j|5| that is supplemented 
by the procedure that rejects all structures related by symmetry. This algorithm allows 
us to determine the lowest (native) energy state E , its degeneracy g, the coordinates of 
corresponding structure(s), and the number of topological contacts c for each sequence. The 
energy levels for 10 sequences are summarized in Fig. (3) for N = 15, Bq = —0.1 and in 
Fig. (4) for N = 27, B = —0.1. The spectra for N = 15 were obtained by enumerating 
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all possible conformations of the chain and arranging them in increasing order of energy. 
For iV = 27, on the other hand, the spectra for the various sequences are obtained by slow 
cooling Monte Carlo method, the details of which are reported in Appendix B. The results 
in Fig. (4) for N = 27 are instructive. For each sequence we show two columns. The 
left column gives the spectrum calculated by numerical method, whereas the right column 
is the spectrum that would be obtained if only the compact structures were retained. A 
comparison of the two columns for various sequences clearly reveals that for a majority of 
sequences the low lying energy levels are, in fact, noncompact. Thus, from this figure we 
would conclude that these noncompact structures would make significant contributions to 
various thermodynamic properties. This figure also shows that for the sequences whose 
native conformation is compact, the energy gap A C s calculated using compact structures 
spectrum alone exceeds the true gap. This appears to be a general result and can be 
understood by noting that the lowest energy excitations for such sequences are created by 
flipping surface bonds. The resulting structure would be noncompact and its energy would 
be lower or equal to that of other compact structures. Thus, it should in general be true that 
when the native conformation is compact, Acs > A. Since there are large scale motions 
that are expected to be involved in protein folding the physically relevant energy scale should 
be the stability gap [|I^,|l7j]. There is no straightforward relationship between the stability 
gap and Acs or A. We rejected all sequences with nonunique ground state from further 
analysis. 

In order to determine folding times for a range of a (= (Tg—Tf)/Tg) values sequences with 
varying spectral characteristics are required. It is known that a generic randomly generated 
sequence (even with unique native state) does not fold rapidly [1],@,|J. Thus, to expand the 
database of the sequences we used a technique proposed by Shakhnovich and Gutin P6|J4T|| 
to create a set of optimized sequences. We should stress that in our study this was used as 
merely a technical device to generate sequences that span a rather wide range of a. Here 
we will briefly present the idea of this scheme [If|H7]. One selects an arbitrary ('target') 



structure corresponding to a given initial random sequence. Then standard Monte Carlo 
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algorithm is applied in the sequence space. The first step of this scheme can be described in 
the following way. Two energies Bij and B k i of the initial ('old') sequence picked at random 
are interchanged, so that the topological contact between residues k,l has the interaction 



energy B' kl = B^ and the contact between residues i,j - the interaction energy B'^ = B 



hi- 



Thus, a new probe sequence, which differs from the initial one by the energies B[- and 
B' kl , is produced and is subject to Metropolis criterion. To do this, the energy of the new 
sequence E new fitted to the target structure is calculated and compared with E u of the 
initial sequence. If the new sequence provides lower energy at the target structure, it is 
unconditionally accepted and B^ = B'^, B^i = B' kl . If E a id < E new , the new sequence is 
accepted with the Boltzmann probability P = exp(— [E new — E i d )/T). If the new sequence is 
rejected, the initial sequence is restored. This permutation procedure, which does not alter 
the composition of the sequences, is repeated n times. The control parameter is temperature, 
and if it is sufficiently low, a series of sequences are quickly generated after relatively small 
number of MCS (10 4 ), whose energies when fitted to the target structure are remarkably 
low. By employing the full enumeration procedure (or Monte Carlo simulations for N = 27) 
one can verify that these energies are actually the lowest ones in the spectrum. In this 
manner, a series of new 'optimized' sequences are created. An application of optimization 
scheme allowed us to investigate essentially wider range of values of the parameter a than 
can be done by analyzing sequences produced at random. 

In all, for N = 15 we chose 32 sequences with B = —0.1 and 9 sequences with B = —2.0 
for detailed study. Among these with B = —0.1, 15 sequences were random and 17 - 
optimized. The native structures of these sequences were CS (e.g., structure (a) in Fig. (1)) 
as well non-CS (e.g., structure (b)) and included from 8 to 11 topological contacts. For 
B = —2.0 all sequences were optimized and as suspected the native conformations were 



all CS [21 j. For N = 27 we analyzed 15 optimized sequences with B = —0.1 and 2 with 
Bo = —2.0. As for N = 15 their native conformations were CS as well as non-CS with 21-28 
topological contacts (Fig. (2)). We believe that this choice of sequences is sufficient and we 
do not expect qualitatively different behavior for this model, if a larger database is selected. 
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Since our objective is to compare the rates of folding for different sequences it is desirable 
to subject them to identical folding conditions. The equilibrium value of < \ > measures 
the extent to which the conformation at a given temperature is exactly equivalent to a 
microscopic conformation, namely the native state. At sufficiently low temperature < x > 
would approach zero, but the folding time may be far too long. We chose to run our Monte 
Carlo simulations at a sequence dependent simulation temperature T s that is subject to two 
conditions: (a) T s be less than Tf for a specified sequence so that the native conformation 
has the highest occupation probability; (b) the value of < x(T = T s ) > be a constant for all 
sequences, i.e. 

< x (T = T s )>=a. (11) 

In our simulations we choose a = 0.21 and this value was low enough so that T s /Tf < 1 for 
all the sequences examined. This general procedure for selecting the simulation temperatures 
has been previously used in the literature jn|,[2l|. For N = 15 T s is precisely determined 
using Eq. (11) because < x(T = T s ) > can be calculated exactly using the full enumeration 
procedure. The simulation temperatures for N = 27 are calculated using the protocol 
described in Appendix A. 

F. Monte Carlo Simulations and Interpretation of Folding Kinetics 

In the present work we used standard Monte Carlo (MC) algorithm for studying folding 
of different sequences to their native states. The local simulation dynamics includes the 
following moves (Fig. (5)): (i) corner moves, which flip the position of jth residue across 
the diagonal of the square formed by bonds (j — and + 1); (ii) crankshaft rotations, 
which involve changing the positions of two successively connected beads j + 1 and j + 2 
(positions of j and j + 1 beads, which are nearest neighbors on a lattice, remain unchanged); 
(iii) rotations of end beads, in which the end bead (1 or N) moves to any of 5 adjacent sites 
to the beads 2 or iV — 1 (the sites previously occupied by the beads 1 or iV are not considered 
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as possible new sites). This particular set of moves has already been applied in Monte Carlo 
simulations [^l],^ ; for a discussion of the dependence of the kinetic results on the move set 
used in the Monte Carlo simulations see Ref. 13. In order to ensure maximum efficiency of 
exploring conformation space it is reasonable to assign different probabilities for the moves 
(i), (iii) and the move (ii). After probing several values we found that the probability 
p = 0.2 for the moves (i) and (iii) (involving single residue) provides the best efficiency of 
searching a native state of a test sequence by Monte Carlo algorithm. Qualitatively, it is clear 
that very small probability of moves (ii) depletes the ability of a chain to sample different 
conformations, while excessively high probability of their occurrence may deteriorate the 
ability to accept moves that would lead to acquisition of the last few native contacts when 
the chain is near the native conformation. 

For clarity of presentation let us now describe one step of the Monte Carlo algorithm. In 
the beginning, the type of move (single bead move ((i) or (iii)) or crankshaft rotation (ii)) 
is selected at random taking into account the probability p introduced above. After this a 
bead in the chain is chosen at random and the possibility of performing the selected move 
depending on the local configuration of the chain is established as follows. If there is a chain 
turn at the bead j selected for move (i) or in the case of move (ii) the beads j, j + 3 are lattice 
nearest neighbors, a move is accomplished; otherwise, one must return to selection of move 
type. Then a self-avoidance criterion is applied: if the move results in double occupancy of 
lattice sites, it is rejected and a new selection of move type has to be made. If self-avoidance 
criterion is satisfied, the new conformation is adopted and Metropolis criterion is used, i.e., 
the energies of old and new conformations, E a id, E new , are compared. If new conformation 
has lower energy, the move is accepted and a Monte Carlo step is completed. If E new is 
higher than E Q i d , then the Boltzmann probability P = exp(—(E new — E Q i d )/T) is calculated 
and compared with a random number £ (0 < £ < 1). If £ is smaller than P, the move is 
accepted and MC step is completed. Otherwise, the move is rejected and the previous MC 
step is counted as a new one. The fraction of accepted steps on average constitutes 5-15 
percents for the entire trajectory. In principle, using ergodic measures this percentage can 
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be adjusted a priori to maximize sampling rate . 



The initial conformations of all trajectories correspond to random extended coil ('infinite 
temperature' conformation). After a sudden temperature quench the chain dynamics was 
monitored for approximately 10 5 to 5 x 10 7 MC steps (MCS), depending on a folding kinetics 
of a given sequence. In order to obtain the kinetics of folding for a particular sequence the 
dynamics was averaged over M independent initial conditions. For example, < x(t) > is 
calculated as 

i M 

<x(i)>=»E#), (12) 

i=l 

where Xi{t) is the value of \ for the i th trajectory at time t. Another important probe of 
folding kinetics is the fraction of trajectories P u (t) which have not yet reached the native 
conformation at time t 

P u (t) = l- f P fp (s)ds, (13) 

J 

where Pf P {s) is the probability of the first passage to the native structure at time s defined 

as 

^00 = 17 ( 14 ) 

1=1 

In Eq. (14) tu denotes the first passage time for the i th trajectory. Similarly, other quantities 
were calculated. Typically, the number of trajectories M used in the averaging varied from 
100 to 800. We find that for smaller values one cannot get reliable results at all. If M as 
small as 10 is used one can obtain qualitatively incorrect results. The precise choice of 



M (which is sequence dependent) was determined by the condition that the resulting kinetic 
and thermodynamic properties should not change significantly with subsequent increase in 
M. The tolerance used in determining M was that the various quantities of interest converge 
to within 5 percent. 
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G. Computation of Folding Rates 



The most important goal of this paper is to obtain folding times for sequences at various 
temperatures. These times (or equivalently the folding rates) were calculated by analyzing 
the time dependent behavior of the dynamic quantities. It is clear that < x(i) > provides 
the most microscopic description of the kinetics of approach to the native state. The folding 
times reported here were obtained by a quantitative analysis of < x(t) >• For all sequences 
we find that after a transition time < x(t) > can be fitted as a sum of exponentials, i.e., 

< x(t) >= cli exp( — ) + a 2 exp( — -) (15) 

T TINC Tf 

In most cases, biexponential fit like in Eq. (15) gave the best approximation to the computed 
kinetic curves. However, for some sequences it was found that a single or three exponential 
fit were more suitable. The interpretation of the amplitudes 01,02 and the time constants 
i~TiNC, T f are discussed in Sec. (III. A. 3) . It must be noted that the kinetic curves for very 
slow folding sequences (those with large values of a) do not reach the required equilibrium 
values. However, even in these instances (6 out of 32 for iV = 15, B = —0.1) our simulations 
were long enough to observe the transition to the native conformation so that an accurate 
estimate of the folding time can be made. It should be noted that the trends in folding 
times remain unchanged if the mean first passage time is substituted for 77 (or t tinc ) in 
Eq. (15). 



H. Monitoring Intermediates in Folding Process 

Since the underlying energy landscape in proteins is thought to be rugged it is 

likely that there are low energy basins of attraction, in which the protein can get trapped in 
for arbitrary long times. Explicit construction of such landscape in lattice models, albeit in 
two dimensions, has revealed the presence of such states as important kinetic intermediates 
in certain folding pathways P5|,S5 . In our simulations we have used the following strategy 



to describe the nature of intermediates in the folding of the various sequences. We divided 
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each trajectory into two parts: the first part starts at the beginning of the trajectory (t = 0) 
and ends when the native structure is formed for the first time, i.e. when the first passage 
time Tu for the i th trajectory is reached. We labeled the trajectory for < t < tu as the 
relaxation part. The remaining portion of the trajectory for Tu < t < t max is referred to 
as the fluctuation part, where t max is the maximum time for which the computations are 
done for a given sequence. Using the trajectories corresponding to the relaxation regime we 
calculated for each sequence the probability of occurrence of the low energy states E k) i.e. 



where Et(t) is the energy at the i th trajectory at the time t. The state with the energy 
E k , which has the largest value of P r , is defined to be a kinetic intermediate for a given 
sequence. We also calculated the probability that this state occurs in the fluctuation parts 
of the trajectories. The fluctuation probability that the kinetic intermediates with the energy 
E k (for conformations other than the native one there could be more than one structure with 
the same energy) are visited after the transition to the native conformation is defined as 



where t max is the maximum time of simulation. This quantity was calculated to monitor if the 
chain, after reaching the native conformation, makes a transition to the same intermediate 
which was visited with overwhelming probability on the way to the native conformation. 



Since this section describes in complete detail the results for a variety of cases making it 
quite lengthy we provide a brief summary of its contents. The general methodology described 
in the previous section has been used to study in extreme detail the kinetics of folding for 
N — 15. For this value of N we have considered two values of B which sets the overall 
strength of the hydrophobic interactions. We have chosen B = —0.1 and B = —2.0. The 




(16) 




(17) 



III. RESULTS 
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former choice is a bit more realistic, while the latter was chosen to contrast the role of CS 
versus non-CS in determining the thermodynamics and kinetics of folding. As emphasized 
before the value of N = 15 is about the largest value of N for which exact enumeration studies 
are possible in three dimensions and for which the kinetics can be precisely determined in 
terms of all allowed conformations being explored. The results for N = 15 and for B = —0.1 
and B = —2.0 are presented in Sec. (III. A). In Sec. (III.B) we present the results for 
N = 27, for which we have also chosen B = —0.1. The thermodynamic properties with 
Bq = —2.0 are discussed as well. A comparison of N = 15 and N = 27 shows very similar 
qualitative behavior. 

A. N = 15: B = -0.1 and B = -2.0 

1. Thermodynamic Characteristics 

The two relevant temperatures Tg and Tf for each sequence are computed from the 
temperature dependence of C v and A%, respectively. In addition we have computed < Q > 
and < x > as a function of temperature. The midpoints in the graphs of these quantities can 
sometimes be used to obtain an estimate of Tf. In general, Tf obtained from the peak of Ax 
is smaller than that determined from the midpoint of < x > or < Q >■ The plots of < x >, 
Ax ,< Q >, and C v as a function of temperature are displayed in Fig. (6) for the sequence 
labeled 14. From the graphs of C v and Ax the collapse transition temperature T 9 and the 
folding transition temperature Tf are found to be 0.65 and 0.45 respectively. The simulation 
temperature, T s , is calculated using Eq. (11) and in this case it turns out to be 0.38. In Fig. 
(7a) we present the dependence of T s on the crucial parameter a = (Tg — Tf)/Tg. In addition 
we also display the correlation between T s and the energy gap A (Fig. (7b)). From these 
figures it appears that T s correlates well with a. Thus as far as the simulation temperature is 
concerned it appears that T s is decreasing function of a. This implies that if a is small then 
the simulation temperature can be made higher and fast folding can therefore be expected. 
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This is further quantified in Sec. (III. A. 4). We should emphasize that this correlation is 
only statistical in the sense that large (small) values of a yield small (large) values of T s . 
However given two values of o that are closely spaced it is not possible to predict the precise 
values of T s . 

Since the dimensionless parameter a serves to distinguish between foldable sequences and 
those that do not reach their native conformation on a reasonable time scale it is interesting 
to see if a can correlate with the spectrum of the underlying energy function. It has been 
argued that the only important parameter that is both necessary and sufficient to account 



for foldability is the gap A p0|,pi|,^2[ . The plot of o as a function of A is shown in Fig. (8a). 
In the lower panel (Fig. (8b)) we plot a as a function of the dimensionless parameter A/T s . 
This figure shows very clearly the lack of correlation between a and A. Thus, it is seen that 
no clear correspondence exists even in these models between a and A. This, of course, is 
not surprising because Tq is determined by the entire energy spectrum - most notably the 
higher energy non-compact structures. The results in Fig. (8a) show that large values of 
a appear to correspond well with small values of A. On the other hand small values of a 
simultaneously corresponds to both small as well as large values of A. 



2. Contribution to Thermodynamic Properties from Non-compact Structures 

The number of non-compact structures even for small values of iV (such as 15 and 27) 
far exceeds that of compact structures. It has already been mentioned that full enumeration 
of all self- avoiding structures in three dimensions becomes increasingly difficult for iV > 15. 
Thus in the literature it has been explicitly assumed that, in general, the native confor- 
mation in this model is maximally compact and that the thermodynamic properties can 
be determined using the spectrum of compact structures alone PD|J2T|J4^| ] . It has also been 
argued that the only relevant aspect of the energy spectrum that determines both kinetics 
and thermodynamics of folding in these models is the gap defined as 

A = E 1 -E , (18) 
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where E and E\ are the lowest energy and the energy of the first excited state in the 
spectrum. Since we have enumerated all possible conformations for N — 15 this can be 
explicitly checked by comparing various thermodynamic quantities computed exactly with 
those obtained by including only the contribution from compact structures. This has been 
done for several sequences and the typical results for two sequences are shown in Figs. (9) 
and (10). In Fig. (9) we present the results for < x > an d A% for B = —0.1. The relatively 
small but realistic value of -Bo makes the comparison between these quantities calculated 
using CS alone and the values calculated using full enumeration least favorable. From Fig. 
(9a) we find that T s found from full enumeration is roughly one half that obtained using 
compact structures enumeration (CSE) alone. In fact, T^ SE exceeds the collapse transition 
temperature T e which implies that if simulations are performed at this temperature the 
native conformation will not be stable at all. Fig. (9b) shows that Tj (the folding transition 
temperature) is once again one half that of T^ SE , indicating the importance of non-compact 
structures in this case. 

In Fig. (10) we show the behavior of < x > (T) and A%(T) for B = —2.0 for another 
sequence. In this case the agreement between the exact results and that calculated using 
CS alone is significantly better. The difference between the two is roughly on the order of 
ten percent. These calculations clearly show that even in these models the importance of 
non-compact structures is dependent upon the value of B . Only for large values of | B \ 
the low energy spectrum is dominated by CS alone. 

3. Folding Kinetics: Kinetics of Approach to the Native Conformation 

We begin with a discussion of the approach to the native state starting from an ensemble 
of disordered conformations. The kinetics of reaching the native state was monitored by 
studying the time dependence of < x(t) > averaged over several initial conditions. For 
all the sequences that we have examined we find that < x(t) > can be fit by a sum of 
exponentials after a transient time. In our previous studies using off-lattice models we had 
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shown that in general for a foldable sequence a fraction of initial population of molecules 



reaches the native state directly without encountering any discernible intermediates pH| , |50 
This is the case in these models as well, thus further supporting the conclusions of our 
earlier work which was based on Langevin simulations of off-lattice models. Since the data 
base analyzed here is more extensive, covering a wide span of a, we can further classify the 
meaning of the various exponential terms that arise in the time dependence of the overlap 
parameter. In order to classify the various sequences in terms of the rapidity of folding to 
the native conformation we have used the parameter a as a discrimination factor. 

(a) Fast folding sequences (a < 0.1): 

For these sequences the structural overlap function < x(t) > for t greater than a transient 
time is adequately fit by a single exponential, i.e., 

< x(t) >^ a f exp(-t/t TINC ). (19) 

In these cases the folding appears to be a two state all-or-none process and ry « r TINC) 
where ttinc is the time scale of topology inducing nucleation collapse (TINC). The folding 
and the collapse is almost synchronous. It has been shown in other studies that the folding 
for these sequences proceeds by a TINC mechanism P4]J5T1| -|5^| . In these studies the TINC 
mechanism was established by studying the microscopic dynamics of the trajectories. We 
found that once a critical number of contacts is formed (corresponding to a nucleus) the 
native conformation is reached rapidly. This fast folding is clearly observed when a is less 
that 0.1. 

(b) Moderate folding sequences (0.1 < a < 0.6): 

These are sequences when a single exponential fit cannot describe time course of the overlap 
function < x(t) >. We find that after an initial time, < x(t) > is well fit by two exponentials. 
The interpretation of the fast and slow processes have been given elsewhere in our studies 



of continuum models using Langevin simulations ||4||hJ. The range of a that characterizes 
moderate folding is 0.1 < o < 0.6. The onset of the intermediate values of a is easy to obtain. 
This can be inferred by the smallest value of o for which biexponential fit of < x(t) > is 
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required to describe the approach to the native conformation, 
(c) Slow folding sequences (a > 0.6): 

These are sequences with a > 0.6. When a gets close to unity, these sequences do not 
fold on any reasonable simulation time. If a is greater than almost 0.6 we once again find 
that multiexponential fit to < x{t) > is needed. The boundary between moderate and 
slow folding sequences is rather arbitrary. In both these cases we find that the various 
stages of folding can be described by a three stage multipathway mechanism (TSMM): 
The initial stage is characterized by random collapse, the second stage corresponds to the 
kinetic ordering regime in which the search among compact structures leads to native-like 



intermediates [0,0. The final stage corresponds to activated transition from one of the 
native-like structures to the native state, which is the rate determining step for folding. 
Thus, the transition states occur close to the native conformation as was shown some time 
ago in off-lattice simulations |16|]43|| . Our earlier lattice and off- lattice studies describe in 
detail the evidence for the TSMM. Analysis of the trajectories probing the approach to the 
native state (measured by x{t)) f° r the sequences studied exhibits similar behavior. 



4- Dependence of Folding Times on a 

In an earlier two dimensional lattice simulations we have suggested that the sequences 
that fold fast appear to have small values of a = (Tg —Tf)/Tg ||33|| . The reason for expecting 
the relationship between a and the folding time r/ has been given recently 0]. Physically 
if a is small then Tg « Tf and all possible transient structures that are explored by the 
chain on its way towards the native state are of relatively high free energy. Consequently, all 
the structures that are likely to act as traps or intermediates are effectively destabilized and 
thus folding to the native structure is rapid. For large a Tf and Tg are well separated and 
hence the chain searches many compact globular states in a rough energy landscape that 
leads to slow folding. When a is small the folding process and the collapse is synchronous 
and this leads to fast folding. Similar conclusions have been reached for 2D square lattice 
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proteins |]33|,|54 



In order to test the possible relationship between the folding time and a we have calcu- 
lated t f for the database of sequences generated by the method described in Sec. (II). The 
results of this simulations are plotted in Fig. (11) for N = 15 and Bq = —0.1 (solid circles), 
B = —2.0 (open circles). The figure shows that the folding time Tf correlates statistically 
extremely well with the intrinsically thermodynamic parameter a. The sequences span a 
range of a and consequently meaningful conclusions can be made. In fact, sequences with 
small values of a fold extremely rapidly (fast folding sequences): for o smaller than 0.1, Tf 
hardly exceeds 10 4 MCS. Most sequences (i.e., having a between 0.15 and 0.6) have inter- 
mediate folding rates extending from roughly 10 5 to 10 7 MCS (moderate folding sequences). 
The sequences with the largest values of a (greater than 0.6) are slow folding sequences, 
whose typical folding times are above 10 7 MCS. Thus, in the range of a examined the fold- 
ing rate changes by about four to five orders of magnitude. It is important to point out 
that all fast folding sequences are optimized, whereas moderate folding sequences also in- 
clude random ones. Slow folding sequences are exclusively random. This distinction based 
on folding times and its dependence on a was used in classification of the sequences in the 
discussion in Sec. (III. A. 3). 



5. Relationship between Tf and A 

Sali et al. have recently asserted (without providing explicit calculation of folding times) 
that sequences that fold rapidly and whose native conformation is also stable are charac- 
terized by large gap [fjf],[n[]. The gap in their model is defined using Eq. (18). In order to 
check this claim we plot Tf as a function of A in Fig. (12) for N = 15. The corresponding 
plot for iV = 27 is shown in the next section. Fig. (12) shows that this parameter is of 
little relevance when used to classify folding rates of different sequences. It appears that the 
sequences with large gaps A usually fold rather rapidly (about 10 4 — 10 5 MCS). However, 
sequences having a small energy gap A can have either very small folding times (below 10 4 
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MCS) or do not reach native state even after 10 8 — 10 9 MCS. For example, from Fig. (12) it 
is clear that if Tf is fixed at 10 5 MCS then we can, in principle, generate a large number of 
sequences with very small values of A to very large A all with roughly the same r/. Thus, A 
alone cannot be used to discriminate between fast and slow folding sequences. The existence 
of large values of A/kgT being a criterion for stability follows from Boltzmann's law with 
proteins being no exception. 



6. Kinetic Events in the Folding Process 

We have systematically investigated the microscopic processes that are involved in the 
folding of several sequences. This has been done by using the methodology for monitoring 
the intermediates described in Eqs. (16) and (17). We discuss the results of this study for 
the various sequences using the classification in terms of the parameter a. 
(a) Fast folders (a < 0.1): In this case there are no well defined intermediates in the 
sense that the chain gets trapped in a conformation that is distinct from the native state for 
any length of time. For these fast folding sequences we find that the native conformation 
is reached by essentially a TINC mechanism, i.e. once a certain number of critical native 
contacts is established then the native state is reached rapidly p4| , |50| , |5l| . By using a com- 



bination of Eqs. (16) and (17) we find that for fast folders the chain frequently visits the 
nearest low lying energy conformations even after reaching the native structure. For these 
sequences these low lying states are almost native-like (have about 90 percent of native con- 
tacts). In terms of the underlying energy landscape it is clear that they belong to the same 
basin of attraction as the native conformation. 

(b) Moderate and slow folders (a > 0.1): These sequences appear to have well defined 
intermediates and their significant role makes folding in this range of a quite distinct from 
the fast folders. In Fig. (13) we plot P r and Pfi (see Eqs. (16,17)) for a variety of sequences. 
The sequences are arranged in order of increasing folding time. Recall P r corresponds to 
the average probability that the intermediate with the energy has the highest proba- 
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bility of occurrence before the native conformation is reached for the first time and Pfi is 
the average probability that this state is revisited after the native state is reached. This 
graph shows several striking features: (i) For moderate folders there is a finite probability 
of the chain revisiting the same intermediate that it sampled in the approach to the na- 
tive conformation. This seldom happens in the sequences that fold slowly. It is clear that 
slow folding sequences have well defined intermediates which are not visited after the chain 
reaches the native conformation. These results suggest that the rate determining step in 
slow folding sequences is the transition from one of these intermediates to the native state. 
This involves overcoming a substantial free energy barrier. The existence of this barrier also 
prevents frequent excursions from the native state, (ii) It is of interest to probe the nature 
of intermediates that are encountered in the folding process. In Figs. (14a) and (14b) we 
show, respectively, the fraction of native contacts in the most populated intermediates and 
the corresponding overlap Xk with native conformation for all sequences. For both fast and 
moderate folders it is clear that the states that are sampled have great structural similarity 
to the native conformation. In fact, in this case these conformations have roughly 80 percent 
of the native contacts. However, slow folding sequences have only about 50 percent of native 
contacts in the intermediate structures. It also turns out that in the case of moderate folders 
the most populated intermediates prior to formation of native conformation is most often 
the first excited state whereas in slow folding sequences it is the higher excited states that 
have the largest probability of occurring, (iii) The rate of formation of the intermediates 
can be ascertained by examining Fig. (14c), in which the ratio of the mean time to reach 
the intermediate r k to the folding time T/ p is plotted. Intermediates for fast and moderate 
folding sequences usually occur at later stages of folding than for slow folding sequences. 
In fact, for the latter cases the ration r^/r^p can be as low as 0.1. This implies that these 
relatively stable misfolded structures are formed relatively early in the folding process and 
these off-pathway processes therefore slow down the folding considerably. It also follows that 
the rate determining step for slow folders occurs late in the folding process implying that 
the transition states are closer to the native structure [ 43| . 
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B. N = 27: B = -0.1 and B = -2.0 



For N = 27 we have generated 15 sequences with B = —0.1 using the optimized design 
procedure described in Sec. (II. E). In this case all of the sequences have been optimized so 
that a is in the range of a < 0.12. We have studied the thermodynamic properties of few 
sequences with B = —2.0. Since we have already established that non-compact structures 
make significant contribution to both the thermodynamic and kinetic properties for N = 15 
it is necessary to include them in studying the case of N = 27 as well. The number of 
non-compact structures for N = 27 is of the order of 10 18 and their exact enumeration is 
impossible. We have, therefore, used slow cooling Monte Carlo method (see Appendix A) to 
calculate the thermodynamic properties in this case. The calculation of the kinetic processes 
have been done as before for N = 15. We discuss there results below. Since the qualitative 
behavior remains the same we provide a less detailed account for this case. 

1. Thermodynamic Characteristics 

As before we have determined Tq and 7/ from computing the temperature dependence of 
C v and Ax- The simulation temperature was found by requiring that the overlap function 
< x(T s ) >— 0.21. These temperatures were calculated using Monte Carlo simulations. It 
is interesting to compare the results for the overlap function obtained from MC simulations 
with that calculated using CSE only (Figs. (15,16)). The results for the temperature 
dependence of Ax for one sequence with B = —0.1 is given in Fig. (15b) and in Fig. (16b) 
Ax as a function of T is plotted for another sequence with B = —2.0. The dotted line in 
these figures are the results obtained using the contribution of compact structures only. Both 
these sequences have large values of the energy gaps A. These figures show dramatically 
that the neglect of noncompact structures leads to serious errors in the determination of A\- 
Similar discrepancies are found for other thermodynamic quantities as well. The errors in 
the estimate of Tf in these two sequences are about a factor of 2 - 3. In fact, in both cases the 
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estimate of T^ SE obtained using compact structures alone exceeds the collapse transition 
temperature Tg. The neglect of noncompact structures, even for B = —2.0, results in more 
serious errors than for N = 15. In any event for both N = 15 and N = 27 restriction to 
compact structures alone can lead to incorrect results for thermodynamic properties. 

It is interesting that the discrepancy between the simulation temperature T S MC and 
is even more dramatic for iV = 27 at the value of Bq = —0.1 (Fig. 15a). In particular, for 
the sequence 61 for which A\(T) is displayed in Fig. (15b) T^ SE = 3.19, which is even 
larger than T$ = 1.22, exceeds T S MC = 1.17 by almost a factor of three. As expected 
for Bq = —2.0 the discrepancy is somewhat smaller but is still very significant (see Fig. 
(16a)). In this case = 3.60 is almost twice as large as T S MC = 2.06. The value of Tq 

for this sequence is T e MC = 2.14. This implies that, if the simulation temperatures 
are used, the only conformations that are thermodynamically relevant are the random coil 
ones. These observations clearly demonstrate that the use of only compact structures for 
calculating thermodynamic quantities is in general totally flawed and would lead to incorrect 
evaluation of the folding rates for both B = —0.1 and B Q = —2.0 for any N. It is worth 
noting that a similar conclusion has been reached for a two letter code model with N = 27 
[5S|. 

In Fig. (17) we collect the results for the ratio of the simulation temperature calculated 
using compact structures T^ SE to that computed using all available conformations T s . For 
the case of N = 27 we have already emphasized that the simulation temperature (denoted 
as T S MC ) is calculated usine; Monte Carlo simulations the details of which are examined in 
Appendix A. In Fig. (17a) we show the ratio T^ SE /T EE for N = 15, Bq = —0.1. It is clear 
that except for one sequence this ratio is greater than unity and is large as 2.5. The results 
Tg E jTf 1 for N = 27, B Q = —0.1 are displayed in Fig. (17b). Here the effects are even 
more dramatic. In all cases this ratio exceed 2.0 implying that noncompact structures make 
significant contributions in determining thermodynamic properties. 

A further consequence of using only compact structures to determine T s (as has been 
done elsewhere f20]2l]]) is that at these high temperatures the native conformation has no 
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stability. It is because of the very large values of T^ SE , Sali et al. find that in many cases 
their native conformations are not well populated. In fact, in most cases the probability of 
being in the native conformation is less than 0.1. 

2. Folding Kinetics: N = 27 and B = -0.1 

Since the time scales for folding for N = 27 are quite long we have restricted ourselves 
to determining the folding rates for optimized sequences only. Thus, we have examined 15 
sequences with characteristic temperatures Tg and Tf that provide the values of a being less 
than about 0.12. These sequences, according to the classification derived at detailed study 
of A = 15, would all be the fast folders. Thus, we expect that most of these sequences 
would reach the native conformation extremely rapidly. In these instances folding would 
appear to be a two state all-or-none process and the time dependence of < x(t) > should be 
exponential. All these expectations are borne out. In Fig. (18a) we show < x(t) > for one 
of the sequences from fifteen examined. It is obvious that < x(t) > is well fit by a single 
exponential process. 

For some sequences we do find that < x(0 > can De better fitted by a sum of two 
exponentials. Thus, for N = 27 even for these small values of a we find that these sequences 
should be classified as moderate folders (Fig. (18b)). This is not surprising because as 
N increases the probability of forming misfolded structure also increases. The boundaries 
differentiating the fast and moderate folders depend on the sequence length. For large values 
of ./V the range of a over which the sequences behave as fast folders decreases. Consequently, 
the partition factor $(T), which is the fraction of initial population of molecules that reaches 
the native conformation via TINC mechanism, decreases. 

3. Dependence of Folding Times on a, A, and Acs 

The dependence of the Tf on a is shown in Fig. (19). Even though we have examined only 
a small range of a the general trend that r/ is well correlated with a is clear. Considering 
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that one has statistical errors in determining both a and t/ the observed correlation between 
these quantities is, in fact, remarkable. In this limited range of a the folding time changes 
by nearly a factor of 300 indicating that small changes in o (which is an intrinsic property 
of the sequence) can lead to rather large changes in r/. The behavior of r/ on A is shown 
in Fig. (20). The trend one notices is the same as in the case of N = 15. In fact, the lack of 
any correlation between 77 and A is even more apparent here. As in the case for N = 15 it is 
possible to generate sequences with arbitrary values of the gap that would all have roughly 
the same folding time. Notice that although a covers only a small range the gap extends 
over a much wider interval. This also implicitly indicates no dependence of a on A. 

If there would be any plausible relation between Tf and A it is clear that A has to be 
expressed in terms of a suitable dimensionless parameter. Unfortunately the proponents 



of the energy gap idea pO| , pT| as the discriminator of folding sequences have used varying 
definitions of A in different papers without providing the precise way this is to be made 
dimensionless. The energy parameter that would make A dimensionless cannot be B, the 
standard deviation in the distribution of contact energies which merely sets the energy scale, 
because this would mean this criterion would apply only to this model. The gap A in Figs. 
(12) and (20) is measured in units of B. A very natural way to make A dimensionless is to 
divide it by ksT which is in this case ksT s . In Fig. (21) we have presented the folding times 
Tf as a function of A/T s . The upper panel is for N = 15, while the lower one is for N = 27. 
These figures demonstrate even more dramatically the irrelevance of A/T s as a parameter in 
determining folding times. In fact, this figure is almost like a scatter plot. Thus, it is clear 
that energy gap alone (measured in any suitable units) does not determine folding times 
in these models. From this it follows that the classification of sequences into fast and slow 
folders cannot be done using the value of A (measured in any reasonable units) alone. 

In the literature it has been forcefully asserted that foldability of sequences in this class 
of models is determined by Acs, the energy gap for the ensemble of compact structures 



2^ , 2"! ]. Plotting Tf as a function of Acs f° r our model appears to be somewhat ambiguous 



because about half of the sequences have non-compact native conformations. In Fig. (22) 
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we plot Tf as a function of Acs- The upper panel is for iV = 15 and the lower panel is for 
N = 27. It is clear from this figure that there is no useful correlation between Tf and Acs- 
It appears that one can generate sequences with a range of Acs all of which have roughly 
the same folding times. 

C. Kinetic Accessibility and Stability of Native Conformation 

It is well known that many natural proteins reach their native conformation quite rapidly 
without forming any detectable intermediates. However, proteins are only marginally stable 
in the sense the equilibration constant K for the reaction 

U # F (20) 

is only between 10 4 — 10 7 . In Eq. (20) U refers to the denaturated unfolded conformations, 
and F is the folded native state. Thus, AG = Gp — Gjj = —k B TlnK is in the range of 
— Ylk B T to —18k B T. While this is not as large as one observes in typical chemical reac- 
tions involving cleavage of bonds it is still sufficiently large so that the native conformation 
is overwhelmingly populated relative to the ensemble of unfolded states. The Helmholtz 
free energy of stabilization of the folded state with respect to the ensemble of denaturated 
conformations can be written as 

(3AF ~ —AE + S v (21) 

where AE is the stabilization energy, S v is the entropy of the ensemble of the unfolded 
structures. We have assumed that the conformational entropy associated with the native 
conformation is negligible. If the ensemble of denaturated structures corresponds to self- 
avoiding random walks Su for lattice models can be estimated using Eq. (4). For cubic 
lattice Z eff = 4.684, 7 = 1.16, and thus (3AF ~ -AE + 42 for N = 27. If we insist 
that (3AF ~ 10 this would imply that AE{T) pa —52k B T. The same calculations would 
show that AE(T) pa — 20k B T if the ensemble of structures in the denaturated states are 
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essentially compact structures. This estimate is a bit more realistic. These estimates show 
that the native conformation is overwhelmingly populated under appropriate conditions rel- 
ative to the compact structures. More recent experimental studies involving denaturant 
induced unfolding of few proteins have been used to probe the "spectrum" of low free en- 
ergy conformations [5B|. These tools, while being relatively primitive, suggest that typically 



the equilibrium intermediates are also about 6 — 8k bT higher in free energy than the na- 
tive conformation. Thus, even in these cases, under native conditions, the native state is 
overwhelmingly (> 0.9) occupied. 

Most of the simulations we have discussed so far have been done at temperatures below 
Tf but high enough that the < x(T s ) > is as large as possible. This, of course, has been done 
for computational reasons and the constant value of < x{T s ) > has been chosen so that the 
properties of different sequences can be compared on equal footing. However, at these sim- 
ulation temperatures T s the probability of occupation of the native conformation P nat (T s ) 
varies between 0.2 and 0.6, which is significantly smaller than what is observed in real pro- 
teins. Notice that the calculations of Sali et al. pU| , pT] have been done at such elevated 



temperatures that the probability of occupation of native conformation for two hundred se- 
quences examined is usually between 0.01 — 0.05 and none exceeds 0.4. Thus, these authors, 
although have stated the criterion for simultaneously satisfying kinetic accessibility and sta- 
bility (this follows from Boltzmann's law), did not provide any computational or theoretical 
evidence that any of their sequences obeyed the stated criterion at any temperature. 

In light of the above arguments it is necessary to use a different criterion for the choice 
of T s which would ensure stability of the native proteins. Accordingly, we have performed 
simulations for a few sequences with N = 15 and Bq = —0.1 at the temperatures which are 
determined using the following equation 

V {T SV ) = 1 - P nat {T sv ) = c (22) 

where P n at{T sri ) is the probability of the chain being in the native conformation at 
T = T sv . The constant c was chosen to be equal to 0.1, which implies that probabil- 
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ity of the chain being in the native conformation is 0.9. In order to present the con- 
trast between the kinetic behavior of the sequences at temperatures chosen using Eq. 
(22) T sv we chose three sequences one from fast folders (a ^ 0.1), one from moder- 
ate folders (0.1 < a < 0.6), and one from slow folders (a > 0.6). The ratios of the 
simulation temperatures for these sequences T SJ) determined using Eq. (22) to T s ob- 
tained using the overlap criterion are approximately 0.5, 0.6, and 0.6 for fast, moder- 
ate, and slow folders, respectively. It appears that stability of the native conformation 
(occupancy of this state > 0.8) for both fast and moderate folders can be achieved, if 
the simulation temperature T s is taken to be one half of the folding temperature T/. 

The quantity r](T) should only be the function of T/T ST] if the gap between the native 
conformation and the first excited state is large. More precisely, we expect this to be valid 
for all temperatures such that /c#T <C A, where A is the gap, separating the energy of the 
native conformation and the first excited state. To see this r/(T) can be written as 

V(T) * ' XP( 7^i ; (23) 
l + exp(-^) 

for A/k B T > 1. The temperature T sv is determined from Eq. (22) and thus r](T) becomes 

rj(T) = f(T/T sv ) ~ T |^ 77 , (24) 

where y — c/(l — c) and r = T/T srj . This is confirmed in Fig. (23), where plots of rj as a 
function of T/T srj are shown for three sequences. Two of them follow the behavior in Eq. 
(24) for T/T SJ] < 1.5, whereas the sequence with small gap (A/Zc^T^ < 1) does not. 

The behavior of < x(t) > and the fraction of unfolded molecules P u (t) as a function 
of time for two of the sequences is shown in Figs. (24) and (25). The temperature for 
the third sequence was so low for Eq. (22) to be satisfied that the folding time for this 
sequence was estimated to be in excess of 10 10 MCS. By studying these graphs we draw 
the following conclusions: (i) The overlap function %(t) at the low temperature (Fig. (24a)) 
is clearly biexponential, whereas at T = T s (see Eq. (11)) the folding process was an all- 
or-none process. This is because the very small barrier (5E$ ~ /c_bT s ) becomes discernible 
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at T ~ T sr) . This fact alone would yield to the prediction that Tf at T sr) should be about 
a factor of 10 larger. This is consistent with simulation results. So the emergence of the 
second component in P u (t) (Fig. (24b)) is due to the activated transition from the first 
excited state to the native conformation. The behavior of x(t) for the moderate folding 
sequence is qualitatively similar to that at T = T s except that the time constants are 
larger because T sv m |T S (Fig. (25)). (ii) We find that the ratio of the folding times for 
the three sequences at T = T sr] is roughly the same as found at T = T s . This suggest 
that although the overall folding times have increased considerably we do not expect to 
see qualitative differences in the fundamental conclusion regarding the statistical correlation 
between Tf and a . (iii) Examination of the fraction of unfolded molecules P u (t) shows that 



the biexponential behavior is consistent with the kinetic partitioning mechanism [}Pfl . A 
fraction of the molecules, $(T), reaches the native state very rapidly without forming any 
intermediates via TINC mechanism, while the remainder follows a more complex kinetic 
mechanism. The partition factor $(T) is nearly unity for fast folding sequences at T = Tf 
leading to a two state behavior, whereas at low temperatures $(T) < 1. For fast folding 
sequences shown in Fig. (24b) $(T = T srj ) is approximately 0.4. In the case of moderate 
folders $(T) is always less than one for all T < Tf. This is affirmed in Fig. (25b), where we 
find that $(T = T sv ) is approximately 0.19 for the sequence with o = 0.19. 



IV. CONCLUSIONS AND DISCUSSION 

In recent years there have been numerous studies of lattice models of proteins, in which 
the protein is modeled as a self-avoiding walk on a three dimensional cubic lattice. A 
variety of interactions between the beads on this lattice has been studied. In this work 
we have carried out a systematic investigation of the kinetics and thermodynamics of a 
heteropolymer chain confined to a cubic lattice under a variety of conditions. We have, 
as majority of the studies have in the past, used a random bond model to specify the 
interaction between the beads. Specifically the interactions between the beads are chosen 
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from a Gaussian distribution of energies with a non-zero value of the mean Bq. In order 
to obtain a coherent picture of folding in this highly simplified representation of proteins 
we have studied the kinetic and thermodynamic behavior for two values of N (the number 
of beads in the chain), namely N = 15 and N = 27. For N = 15 the thermodynamic 
characteristics can be exactly calculated because all possible conformations in this case can 
be exhaustively enumerated, whereas for N = 27 all the quantities of interest have to be 
obtained by Monte Carlo simulations. In addition the mean hydrophobic interaction Bq 
has also been varied in this work. The kinetics of folding for a number of sequences have 
been obtained by Monte Carlo simulations. The work reported here allows us to assess the 
various factors that govern folding in this model. In addition the variation in parameters 
can be used to address some of the proposals made earlier in the literature. 

The exact calculations for N = 15 of the thermodynamic quantities by enumerating all 
the conformations clearly demonstrate the importance of non-compact structures in deter- 
mining accurately the characteristic properties of the protein chain. For B = —0.1, which is 
a realistic value for proteins (see Sec. (II.B)), the values of the temperatures Tg (the collapse 
transition temperature) and Tf (the folding transition temperature) are much higher, if only 
the compact structure are included. This, in turn, makes the simulation temperature T s (see 
Eq. (11)) also quite high. In fact for most sequences T s determined using only the compact 
conformations exceeds Tq. Although the discrepancies between T s determined using CS and 
all the conformations for B = —2.0 (a rather unrealistic value for proteins) is lesser the 
resulting kinetics is significantly affected. However, for N = 27 the differences between T s 
found from CSE and that determined from Monte Carlo simulations are very significant 
even for Bq = —2.0. The two values in some instances differ by a factor of two. Thus if the 
simulation temperature is chosen using only compact structures that is much more conve- 
nient, then one obtains a value for T s that often exceeds the collapse transition temperature. 
This fact alone explains why the probability of occupancy of the native state was very small 
(in many cases less than 0.1) in the previously reported studies in the literature It 



has been pointed out by Chan |29j that the high temperatures used by Sali et al. pO|j2T|| 
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results in the equilibrium population of the native conformation for 'folding sequences' of 
only 0.01 — 0.05 with none exceeding 0.4. This was not appreciated by Sali et al. because 
they used a very crude criterion for folding. In particular a sequence was designated as a 
folding sequence if at the temperature of simulation the native conformation was reached 
once (a first passage time) at least four times among ten independent trajectories of maxi- 
mum duration of 50 x 10 6 MCS |[21|| . Our studies indicate that ten trajectories is absolutely 
inadequate for obtaining even qualitatively reliable estimate of any property of interest. 
We had shown earlier in lattice and off-lattice studies |28j33 , 44]] that the two temperatures 



that are intrinsic to a given sequence are Tg and Tf. This general thermodynamic behavior 
for protein-like heteropolymers has been confirmed recently ||55|| . In addition the temperature 
of simulation (or experiment) should be below Tf so that the folded state is the most stable. 



Theoretical arguments suggest |34| that the parameter a = {Tg — Tf)/Te is a useful indicator 
of kinetic foldability in the models of the sort considered here, and perhaps in real proteins 
as well. By examining the kinetics of approach to the native conformation we have found 
that roughly (this holds good for iV = 15) the kinetic foldability of sequences can be divided 
into three classes depending on the value of a . For N = 15 we find that fast folders have 
a less than about 0.1, moderate folders have a in the range of approximately 0.1 and 0.6, 
while a values greater than about 0.6 correspond to slow folders. It should be emphasized 
that the ranges for these three classes of sequences depend on the length of the chain: longer 
chains have smaller range of a for fast folding. Fast folding sequences can kinetically access 
their ground state at relatively higher temperature compared to slow folding sequences. The 
kinetics of approach to the native state differs significantly between fast folding sequences 
and those that are moderate folding sequences. In the former case the native conformation 
is reached via a TINC mechanism and there are no detectable intermediates. In this case 
the folding appears as an all-or-none process. The kinetics is essentially exponential. On the 
other hand the moderate folding sequences reach the native conformation (predominantly) 
via a three stage multipathway mechanism as was reported in our several earlier studies. 
The time dependence of the approach to the native conformation is very revealing. For 
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moderate folding sequences the simulation temperature is lower than for sequences with 
small a. Thus if approach to the native conformation with the same value of the overlap 
function is examined we find that the moderate folding sequences follow the kinetic parti- 
tioning mechanism (KPM) This implies that a fraction of the initial population of 
molecules reaches the native state via the TINC mechanism while the remaining one fol- 
lows the three stage multipathway process. The partition factor, $(T), that determines the 
fraction that follows the fast process is sequence and temperature dependent. Thus even 
fast folding sequences, which would have $(T) close to unity at higher temperature, would 
have fractional values less than unity at lower temperature. This is clearly seen in Fig. 
(24b). This work suggests that KPM should be a generic feature of foldable proteins. The 
partition factor can be altered by mutations, temperature, and by changing other external 
factors such as pH. 

We have explicitly calculated the folding times for a number of sequences for the various 
parameters (different N and B Q ) values. This is of great interest to examine whether there 
is any intrinsic property of the sequences that can be used to predict if a particular sequence 
folds rapidly or not. It has been argued based on the random energy model for proteins that 
folding sequences should have as large a value of Tf/T g as possible, where T g is an equilibrium 
glass transition temperature [0,0]. Kinetic studies of lattice models of proteins suggest that 



this criterion may be satisfied for foldable sequences provided T g is substituted by a kinetic 
glass transition temperature which is defined as the temperature at which folding time scale 



exceeds a certain arbitrary value ||21|| . Scaling arguments have been used to suggest that there 



should exist a correlation between folding times and a [p3| , |34]| . More recently it has been 
emphatically stated in the form of a theorem (without the benefit of explicit computations) 
that the necessary and sufficient condition for rapid folding in the models studied here is 
that there should be a large gap (presumably measured in units of /c#T) between the lowest 
energy levels. The calculations of folding times reported here clearly show that there is no 
useful correlation between the gaps and folding times for any parameter values that we have 
investigated. In fact for a specified folding time one can engineer sequences with both large 
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and small values of the gap. Thus the precise value of the gap alone cannot discriminate 
between folding sequences. On the other hand there is a good correlation between the folding 
times for sequences and a. It is clear that sequences with small values of a have short folding 
times, while those with larger values have higher folding times. It should be emphasized 
here that the criterion based on a should only be used to predict trends in the folding times. 
In this sense this criterion should be used in a qualitative manner. The major advantage 
of showing the correlation between a and the folding times is that a can be experimentally 
determined. The folding transition temperature is nominally associated with the midpoint 
of the denaturation curve while Tg is the temperature at which the protein resembles a 
random coil. 

In the usual discussion of protein folding only the issue of kinetic foldability of sequences 
are raised. Since natural proteins are relatively stable (this does not imply that the protein 
does not undergo fluctuations in the native conformation) with respect to both the equilib- 
rium intermediates and the ensemble of unfolded conformations it is imperative to devise the 
criterion for simultaneously satisfying kinetic accessibility of the native conformation and 
the associated stability. This paper for the first time has provided an answer to this issue. It 
is clear from our study that fast folders have small values of a, and consequently designing 
proteins [|3i|,[57}] using this criterion assures kinetic accessibility of the native conformation 
at relatively high temperature. For example, if Tg is taken to be about 60°C then the fast 
folders would reach the native conformation rapidly even at temperatures as 55°C (er rs 0.1). 
However, at these temperatures the native conformation may not be very stable, i.e. the 
probability of being in the native conformation may be less than 0.5. On the other hand, 
if these fast folders are maintained at T rs (0.5 — 0.6)7/ rs (30 — 35)°C for o r* 0.1 and 
Tg k, Q0°C then both kinetic accessibility as well as stability can be simultaneously satisfied. 

Our simulations also suggest that the dual criterion can also be satisfied by using mod- 
erate folders. In these cases we find that the native state is reached relatively rapidly at 
low enough temperatures (T Ri (25 — 30)°C assuming Tg R* 60°C) at which the excursions 
to other conformations are not very likely. In the extreme case of very slow folders we find 
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(see Fig. (26)) that the average fluctuation probability of leaving the native conformation 
after initially reaching it is small, i.e. the stability condition is easily satisfied. In these 
cases however the kinetic accessibility is, in general, not satisfied. From these observations 
it follows that in order to satisfy the dual criterion it is desirable to engineer fast folders 
(small values of a) and perform folding at temperatures around (0.5 — 0.6)Ty. Alternatively, 
one can use moderate folders at temperatures around 0.87/. Since moderate folders are 
more easily generated (by a random process) it is tempting to suggest that many natural 
proteins specifically large single domain proteins may be moderate folders. 

Finally, we address the applicability of the results obtained here to real proteins. Since 
many features that are known to be important in real proteins (such as side chains, the 
possibility of forming secondary structures etc.) are not contained in these models the 
direct applicability to real proteins is not clear. Nevertheless simulations based on other 
more realistic minimal models together with theoretical arguments can be used to suggest 
that the scenarios that have emerged from this and other related studies should be observed 
experimentally. In particular it appears that the kinetic partitioning mechanism which 
for most generic sequences is a convolution of the topology inducing nucleation collapse 
mechanism and the three stage multipathway kinetics should be a very general feature of 
protein folding in vitro. The theoretical ideas developed based on these minimal models 
also suggest that the partition factor $(T) is a property of the intrinsic sequence as well as 
external factors such as temperature, pH etc. In fact recent experiments on chymotrypsin 
inhibitor 2 (CI2) suggest that this KPM has indeed been observed f53|| . In this particular case 



Otzen et al. have observed that CI2 reaches the native state immediately following collapse. 



This, in the picture suggested here and elsewhere [|44j], would imply that in the case of CI2 
under the conditions of their experiment (T = 25°C and pH = 6.2) the native conformation 
is accessed via a TINC mechanism. Moreover, the partition factor $(T = 25°C) is close to 
unity making CI2 under these conditions a fast folder. If we assume that T = 25°C ~ Tf/2 
then it follows that for CI2 the value of a is roughly 0.15. On the other hand these authors 
have also noted that for barnase the rate limiting step comes closer to the native state 
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involving the rearrangement of the hydrophobic core. They suggest in the form of a figure 
(see Fig. (3) of Ref. 52) that barnase follows a three stage kinetics with the rate determining 
step being the final stage. In terms of the physical picture suggested here this can be 
interpreted to mean that for barnase $(T) is small making it either a moderate folder or 
even a slow folder. If this were the case our theoretical picture would suggest a for barnase 
is bigger which is a consequence of the fact that it is a larger protein. These observations 
are consistent with the experimental conclusions of Otzen et al. which are perhaps the 
first experiments that seem to provide some confirmation of the theoretical ideas that have 
emerged from the minimal model studies. It is clear that a more detailed comparison of 
the entire kinetics for various proteins under differing experimental conditions is required to 
fully validate the general conclusions based on the kinetic partitioning mechanism together 
with the classification of sequences based on the values of the parameter a. 
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APPENDIX A: 

It is clear from our studies for N = 15, for which all conformations (compact and 
noncompact) can be exactly enumerated that the neglect of noncompact structures leads to 
qualitatively incorrect results for thermodynamic and kinetic properties at all temperatures. 
However for N = 27 the enumeration of all possible conformations is out of reach. Thus 
one has to resort to numerical methods to obtain as accurate results as possible. We have 
used a combination of slow cooling technique in conjunction with Metropolis Monte Carlo 
method to calculate several thermodynamic characteristics of the system. We have used 
similar algorithms in our studies on continuum models of /3-barrel and four helix bundles 
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p4] , |58[| . In this appendix we describe the algorithms that have been used to obtain the 



collapse transition temperature Tg and the folding transition temperature Tf. 

For a given sequence we first estimate the approximate temperature range which includes 
Tg and Tf (and also T s , at which < x{T s ) >= 0.21). To obtain this temperature interval, T} < 
T s < Tf < Tg < Th, we quench the system starting from a random unfolded conformation 
(from an arbitrary self- avoiding conformation) to a high enough temperature Th- In the first 
phase of the algorithm a few trajectories are generated so that approximate estimates of Tg 
and Tf can be made. This allows us to pin the interval (Ti,Th). This temperature interval 
spans the peaks of C v and Ax from which the characteristic temperatures are obtained. 
Starting from a given initial conformation at Th the following linear cooling schedule was 
applied to lower the temperature 



where i labels the i th step in the slow cooling process and AT is the cooling rate. The initial 
conformation for the (i + l) th iteration is the final conformation at the end of the i th step. At 
each step of the slow cooling process we run Monte Carlo simulations for the time r max . For 
the computation of equilibrium quantities we reject the conformations for a certain length 
of time r eq immediately after quench to a new temperature. This allows for the chain to 
equilibrate at that temperature. The precise value of r eq clearly is temperature dependent. 
In our simulations we chose a large enough value of r eq so even at the lowest temperatures it 
is sufficient for equilibration. Thus at each temperature the time averaged property of any 
quantity of interest, A, can be calculated as 



where j labels the initial trajectory generated at Th. The true thermodynamic value 
< A(T) > is obtained by averaging over an ensemble of initial conditions, i.e. 



T t = T h - iAT 



(Al) 




(A2) 



< A(Ti) > 



1 M 
V1 3=1 



(A3) 
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In using Eqs. (A2) and (A3) we have assumed that the weights associated with the initial 
trajectories are uniform. This will only be valid provided there is not significant amount of 
multivalley structure in the potential energy surface. Since for N = 27 we have used only 
optimized sequences this assumption appears reasonable. Furthermore we have also checked 
that the same values of the thermodynamic properties are obtained in the limit of long 
times in the kinetic simulations as well. This again assures us that the computations based 
on the above procedure yield accurate values of all quantities of interest for the sequences 
examined. 

In our simulations we chose r eq = 50, 000 MCS, r max = 10 6 MCS, and the cooling rate 
AT = 0.01. This value of AT was small enough so that the changes in the thermodynamic 
quantities in between two successive temperatures were less than the fluctuations at these 
temperatures. Specifically this choice ensures us that < x(Tj) > — < x(T _ AT) 
(Ax(T)) 1 / 2 . After the determination of < x{Ti) >, < E(ty >, C„(T), Ax(T) for all 
i(— 0, 1, 2...) using the above protocol they were fitted to appropriate polynomials or in the 
case of < x(T) > to sum of two hyperbolic tangents. These functions were used to obtain 
Tq and Tf for fifteen sequences with B = —0.1 and for two sequences with B = —2.0. The 
later calculations were done merely to check the role of non-compact structures in deter- 
mining thermodynamic properties for N = 27 at a larger value of the overall hydrophobic 
interaction. 

APPENDIX B: 

In this appendix we describe the methods used to obtain the low energy spectrum of the 
sequences with N = 27. The sequences for this case were all optimized using the procedure 
described in Sec. (II. E). For a given primary sequence we first determined a set of low 
energy structures using the algorithm described in Appendix A. The lowest of these energy 
structure was then used as the target structure and an optimized sequence that is fitted to 
this target structure was generated. Typically the target structures contained between 22 to 
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28 (the maximum being 28) topological contacts. This procedure was repeated with another 
primary sequence until all the sequences are generated. 

The lowest energy obtained in the design procedure was assumed to be the ground 
state of the designed sequence. We did not, during the course of our simulations, find any 
counter examples to this. Thus the native energy is determined in the process of sequence 
design itself. In order to obtain the other energy levels we used a slow cooling Monte Carlo 
simulation starting from a high temperature TV The details of this protocol are given 
in Appendix A. During the course of this slow cooling simulations the energy E(t,Ti) is 
monitored for all values of t and T. These energies were then arranged in increasing order 
thus yielding the spectrum for one given sequence. In order to ensure that the energies are 
truly equilibrium spectrum this procedure was repeated for several distinct initial conditions 
(namely, for M = 100). The simulation time for each sequence ranged from 5 x 10 7 to 6 x 10 7 
MCS. 

The procedure described here works best for the determination of the gap A, when the 
temperature T% is low enough so that one essentially has a two level system. Since the 
sequences are all optimized most of the values of A are reasonably large that this never is 
the problem. There were few sequences with relatively small values of A. In these cases 
we also performed slow warming simulations starting from T\. Once again the energy as a 
function of time was recorded and we obtained exactly the same results for A. Thus it is 
extremely unlikely that there are any significant errors at least in the determination of A. 
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FIGURES 



Fig. 1. Native structures for two sequences with N = 15, B = —0.1. These structures 
are the lowest energy structures obtained by full enumeration of all the possible conforma- 
tions, (a) This corresponds to a sequence for which the lowest energy structure displayed 
is fully compact. It contains 11 topological contacts which are ternary interactions between 
between two beads that are separated by at least two other beads along the backbone, (b) 
An example of the lowest energy non-compact structure containing only 10 topological con- 
tacts. In both cases the native state is non-degenerate. Such contacts between non-bonded 
residues are shown as dashed lines. 

Fig. 2. Representation of the native structures for two sequences with N = 27, B = 
—0.1. The native structures were determined by a combination of slow cooling procedure 
and standard Monte Carlo algorithm. The details are given in Appendix A. (a) For this 
sequence the native structure is fully compact and contains 28 topological contacts, (b) 
An example of the native structure that is non-compact containing 22 topological contacts. 
Such contacts between non-bonded residues are shown as dashed lines. 

Fig. 3. Lower part of the energy spectrum for ten sequences. The value of N = 15 
and B = —0.1. The spectra were obtained by explicitly enumerating all the possible 
conformations of the chain for each sequence. The value of the energy gap separating the 
two lowest energy levels and the number of the sequence in the database are given below 
each spectrum. 
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Fig. 4. Lower part of energy spectra for 10 optimized sequences (N = 27, B = —0.1) 
labeled 61 through 70. The left columns of each sequence are the energy levels obtained from 
slow cooling Monte Carlo simulations. The columns on the right represent the energy spectra 
of compact structures. Below each pair of spectrum columns the energy gap A calculated 
from Monte Carlo simulations (upper quantity) and A C s calculated from the spectra of 
compact structures (lower quantity) are indicated. It is seen that for several sequences even 
the lowest energy in the spectrum of compact structures belongs to continuous part of the 
"true" spectrum. These spectra clearly show that in general non-compact structures are 
crucial for the thermodynamics (and kinetics). It is also obvious that Acs > A for the 
sequences with the native conformation being compact structure. This should be true for 
such sequences in this model for all N and all negative values of B . 

Fig. 5. The set of moves used in the Monte Carlo simulations: (i) Corner flip; (ii) 
Crankshaft rotation; (iii) Rotation of end residue. 

Fig. 6. Temperature dependence of thermodynamic quantities for sequence 14 (N = 
15, Bq = —0.1) calculated using full enumeration of all conformations: (a) overlap function 
< x > (T); (b) fluctuations in overlap function A%(T); (c) function < Q > (T); (d) specific 
heat C v . The peaks in the graphs of Ax and C v correspond to the folding transition and 
collapse temperatures, Tf, Tg, respectively. 

Fig. 7. (a) Dependence of the simulation temperature T s determined using Eq. (11) on 
the parameter a = (T e — Tf)/T e . T e and Tf are calculated from the temperature dependence 
of C v and A%(T), respectively, (b) Variation of T s with the energy gap A between the 
ground state and the first excited state for 32 sequences. The energy gap is determined 
from full enumeration of all possible conformations and is hence exact. The data are for 
(N = 15, S = -0.1). 
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Fig. 8. (a) The dependence of a(— (T e — Tf)/T g ) on the energy gap A for various 
sequences with (N = 15, B = —0.1). This figure shows that there is no trend between a 
and A because the determination of a requires the knowledge of the entire energy spectrum, 
(b) Plot of a versus A expressed in units of T s - the simulation temperature. This was done 
because for any useful purposes A has to be measured in dimensionless form. The relevant 
energy is A;bT s . The parameter B (see Eq. (2)) sets the energy scale in terms of which 
everything is measured. This figure shows even more dramatically the lack of correlation 
between a and A. 

Fig. 9. The upper panel (a) shows the overlap function < x > (Eq- (5)) as a function of 
temperature for the sequence labeled 10 with N = 15, B = —0.1. The solid line is obtained 
by calculating < x > using the enumeration of all conformations and the dotted line is the 
plot of < x > calculated using only the set of compact structures. The dashed line is given 
by the constant value of < x >— 0.21. The intersection of the solid line with the dashed line 
determines the simulation temperature, T s . The constant value of < x >— 0.21 is chosen 
so that T s < Tj for all sequences. The lower panel displays the temperature dependence of 
Ax- As in Fig. (9a) the solid line is the exact result obtained by full enumeration while the 
dotted line is the corresponding result for Ax using compact structures only. Note that the 
folding temperature Tf obtained from the peak of the solid line is considerably lower than 
the peak in the dotted line. For all the sequences examined Tf is also less than the midpoint 
of < x > or < Q > ( see Fig- (6c)). 

Fig. 10. Same as Fig. (9) except the curves are for a sequence labeled 91 which has 
N = 15, B = —2.0. In this case Fig. (10b) shows that the difference in Tf (i.e., between 
the peaks in the solid curve (full enumeration) and the dotted curve (compact structures 
only)) is less than for the case with B Q = —0.1. 
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Fig. 11. Dependence of the folding time 17 on a = (T e — Tf)/Tg for a number of 
sequences all with N = 15. The folding time for each sequence is obtained by fitting 
< x(t) > to an appropriate number of exponential functions as described in Sec. (II. G). 
The full circles correspond to B = —0.1 while the empty circles are for B = —2.0. There 
are 32 sequences with B = —0.1 and 9 sequences with B = —2.0. The trend from both 
these parameter values is clear. The smaller values of a have smaller folding times. The 
folding times decreases by nearly five orders of magnitude upon reducing a from 0.8 to 
around 0.1. For the same range of a the folding time is considerably greater for B = —2.0 
than for Bq = —0.1. 

Fig. 12. Dependence of the folding time 17 on the gap A which is the difference in energy 
between the native conformation and the first excited state. The database of sequences is 
exactly the same as that used in Fig. (11). As in Fig. (11) the full circles are for B = —0.1 
and the empty circles corresponds to B = —2.0. This figure shows that there is no useful 
correlation between tj and A. For a given folding time tj one can engineer sequences 
spanning a wide range of gap (both quite small and large). 

Fig. 13. The kinetic probabilities P r (E k ) (cf. Eq. (16)) (shown as empty bars) corre- 
sponding to the largest probabilities that a state with the energy E k occurs before the native 
conformation is reached. The sequences displayed are moderate folders (0.1 ^ a < 0.6) and 
slow folders (a > 0.6). All the sequences correspond to N = 15, B = —0.1. The sequences 
are arranged in increasing order of folding time. For most of the sequences with relatively 
large folding times there is a finite probability of visiting an intermediate. The fluctuation 
probabilities Pfi(E k ) (Eq. (17)), which probe the probability that the same kinetic inter- 
mediate is visited after the chain reaches the native conformation, are shown under hatched 
bars. These probabilities show that most slower folding sequences rarely visit the kinetic 
intermediate after initially reaching the native conformation at least on the maximum time 
scale of the simulations. 
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Fig. 14. Plots of various properties of the kinetic intermediates for the 32 sequences 
with N = 15, B = —0.1. The sequences are arranged in order of increasing folding time, 
(a) Fraction of native contacts in the intermediate. The sequences which fold fast have 
large fraction of native contacts in the intermediate 0.7) while slow folding sequences 
have typically less than 50% of native contacts. These are misfolded structures which act 
as traps thus slowing down the folding process, (b) The value of the overlap function Xk 
between the kinetic intermediate and the native conformation for the sequences arranged in 
order of increasing folding time. Since the structural overlap function is a more microscopic 
order parameter than Q this plot is more informative than Fig. (14a). The trends basically 
are the same. Intermediates for fast folding sequences have the smallest value of Xk (are most 
native- like) and slow folding sequences have large value of Xk- ( c ) Plot of the ratio of the 
average first passage time to reach the kinetic intermediate to the mean first passage time 
Tf p to reach the native conformation. The mean first passage time t/ p is obtained from the 
fraction of unfolded molecules P u (t) as Tf p = J °° P u (t)dt. The figure clearly shows that for 
relatively fast folding sequences the kinetic intermediate which is very native-like is reached 
on time scales comparable to Tf p . For slow folding sequences the kinetic intermediate, 
which does not share many features in common with the native conformation, is reached 
very rapidly. For these sequences the rate determining step is the transition from these 
structures to the native state. 
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Fig. 15. (a) The temperature dependence of the structural overlap function for a se- 
quence labeled 61 with iV = 27, B = —0.1. The results obtained by using the slow cooling 
Monte Carlo simulations (see Appendix A for details) are shown in squares. The solid line 
through these squares represents the fit of the simulation data by hyperbolic tangents. The 
dotted line represents exact calculation of the temperature dependence of < x > using only 
the ensemble of compact structures. The simulation temperature is determined from the 
intersection of the simulation results with the dashed line with < x >— 0.21. (b) The 
fluctuations in the overlap function Ax as a function of temperature. The squares represent 
the results from the slow cooling Monte Carlo simulations whereas the dotted line shows the 
corresponding results obtained using the contributions from the compact structures only. 
The solid line is a polynomial fit through the simulation data. 

Fig. 16. Same as Fig. (15) except it is for a sequence labeled 81 which has iV = 27, B = 
—2.0. In this case the compact structures are expected to dominate. Although the results 
here for the both the temperature dependence of < x > an d A% are closer to the simulation 
data than for sequence 61 (Fig. (15)) there are significant differences in the estimation 
of T s and Tf between the simulation results and those obtained from compact structures 
enumeration. This trend is seen for all sequences. 

Fig. 17. (a) Ratio of simulation temperatures obtained using compact structures only 
rpCSE j n con j unc tion of Eq. (11) to that obtained for the entire ensemble of conformations 
T s = T EE for iV = 15, B = —0.1 for a variety of sequences, (b) Same as Fig. (17a) except 
that the simulation temperature T s = T S MC * is obtained using slow cooling Monte Carlo 
simulations as described in Appendix A. These figures show that T^ SE is always greater 
than T s and often is much greater than T s . We have excluded those sequences for which 
the native conformation is non-compact. The sequences are arranged in order of increasing 
folding time. 
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Fig. 18. (a) The time dependence of the overlap function < x(t) > for the sequences 
labeled 69 with N = 27, B = —0.1. The simulation temperature is T s = 1.07 and the value 
of a ~ 0. The overlap function has been calculated using Eq. (12) by averaging over 200 
initial independent conditions. The solid line is a single exponential fit to the simulation 
data. In this case the folding appears to be an all-or-none process, (b) Same as (a) except the 
plot shows < x(t) > for the sequences labeled 65. The simulation temperature is T s = 0.77 
and the value of a — 0.075. The solid line is a biexponential fit to the simulation data. In 
this case folding appears to be a three stage multipathway process. 

Fig. 19. The variation of the folding time 77 with a = (Tg — Tf)/Tg for N = 27, B Q = 
—0.1. There are fifteen sequences for which tj has been calculated using the procedure 
described in Sec. (II. G). The basic trend here is the same as for N = 15 (see Fig. (11)). 
There is a dramatic increase in folding time as a changes from 0.02 to 0.1. Since all the 
sequences used were optimized the range of a examined here is considerably less than for 
N = 15. 

Fig. 20. Dependence of the folding time 77 on the gap A for the same set of sequences 
as in Fig. (19). The gap A, which is the energy difference between the native conformation 
and the first excited state, was calculated using the procedure given in Appendix B. It is 
clear that as for iV = 15 (see Fig. (12)) there is no useful correlation between 7/ and A. 

Fig. 21. Plots of the folding time 77 as a function of the energy gap A divided by the 
simulation temperature T s . (a) The solid circles correspond to B = —0.1 whereas the open 
circles are for B = —2.0. The value of N is 15. (b) This plot is for iV = 27, B = —0.1. 
These figures reaffirm even more emphatically the lack of any relationship between 77 and 
the energy gap expressed in suitable dimensionless units. 
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Fig. 22. Dependence of the folding time t/ on the energy gap Acs- (a) The solid circles 
correspond to B = —0.1 and open circles are for B = —2.0. The value of N = 15. The 
database of sequences is the same as in Fig. (11). (b) This is for N = 27 and B = —0.1. 
The database of sequences is the same as in Fig. (19). These figures show that the folding 
times do not correlate with the energy gap even when restricted to the ensemble of compact 
structures. 

Fig. 23. Temperature dependence of rj (cf. Eq. (22)) which gives the probability that 
the chain is not in the native conformation. Notice that the temperature is measured in 
units of T/T srj . For sequences with a large enough gaps (i.e. A/ksTg ^> 1) r)(T) is only a 
function of T/T sr] (see Eq. (24)). This is verified for the sequences labeled 10 and 62 for 
T/T sv < 1.5. The temperature T srj is determined from the intersection of rj(T) with the 
straight dashed line giving the constant value of rj = 0.1. Thus, at these temperatures the 
probability of being in the native conformation is 0.9. 

Fig. 24. (a) Time dependence of the overlap function < x(t) > for the sequence labeled 
62 with N = 15, -B = —0.1. The simulation temperature T sr} = 0.47 was determined using 
Eq. (22). The equilibrium value of < x > from exact enumeration is 0.028 at T = T sr] . 
At this temperature, at which the probability of the chain being in the native state is 0.9, 
< x(t) > shows a biexponential behavior. This fast folding sequence (a ~ 0.0) exhibits 
an exponential time dependence at T = T s = 0.76. (b) Plot of the fraction of unfolded 
molecules P u (t) (see Eq. (13)) as a function of the number of Monte Carlo steps. The solid 
line is a biexponential fit to the simulation data. At this low temperature a fraction of 
molecules get trapped in the kinetic intermediate and the transition from this intermediate 
to the native conformation represents the slow step in the folding process. 
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Fig. 25. (a) Same as Fig. (24a) except it is for the sequence labeled 10, whose a is 
0.19. This is a moderate folding sequence in our computational scheme. The dashed line 
corresponds to the equilibrium value of < x > at T = T sr] = 0.25. On the time scale of the 
simulation (3 x 10 7 MCS ) the equilibrium value is not reached, (b) Time dependence of 
the fraction of unfolded molecules P u (t). It is clear that P u (t) has not decayed significantly 
on the time scale of the simulations. The simulation temperature T sr) is so low that only 
24% of the initial population of molecules has reached the native state on the time scale of 
3 x 10 7 MCS. 

Fig. 26. Probability that the native conformation is populated P(E ) for 32 sequences 
with iV = 15, -B = —0.1 at the simulation temperatures T s (cf. Eq. (11)). The sequences 
are arranged in order of increasing folding time. The probability ranges from about 0.2 to 
0.6. It is clear that some of the slow folding sequences make less frequent transitions to 
other states from the native state whereas the fast folding sequences often visit native-like 
conformations. 
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